Contention for Critical Sections Can Reduce Performance and Scalability by Causing Thread Serialization. the Proposed Accelerated Critical Sections Mechanism Reduces This Limitation. Acs Executes Critical Sections on the High-performance Core of an Asymmetric Chip Multiprocessor
نویسندگان
چکیده
......Extracting high performance from chip multiprocessors (CMPs) requires partitioning the application into threads that execute concurrently on multiple cores. Because threads cannot be allowed to update shared data concurrently, accesses to shared data are encapsulated inside critical sections. Only one thread executes a critical section at a given time; other threads wanting to execute the same critical section must wait. Critical sections can serialize threads, thereby reducing performance and scalability (that is, the number of threads at which performance saturates). Shortening the execution time inside critical sections can reduce this performance loss. This article proposes the accelerated critical sections mechanism. ACS is based on the asymmetric chip multiprocessor, which consists of at least one large, high-performance core and many small, power-efficient cores (see the ‘‘Related work’’ sidebar for other work in this area). The ACMP was originally proposed to run Amdahl’s serial bottleneck (where only a single thread exists) more quickly on the large core and the parallel program regions on the multiple small cores. In addition to Amdahl’s bottleneck, ACS runs selected critical sections on the large core, which runs them faster than the smaller cores. ACS dedicates the large core exclusively to running critical sections (and the Amdahl’s bottleneck). In conventional systems, when a core encounters a critical section, it acquires the lock for the critical section, executes the critical section, and releases the lock. In ACS, when a small core encounters a critical section, it sends a request to the large core for execution of that critical section and stalls. The large core executes the critical section and notifies the small core when it has completed the critical section. The small core then resumes execution. By accelerating critical section execution, ACS reduces serialization, lowering the likelihood of threads waiting for a critical section to finish. Our evaluation on a set of 12 criticalsection-intensive workloads shows that ACS reduces the average execution time by 34 percent compared to an equal-area 32-core [3B2-9] mmi2010010060.3d 10/2/010 17:56 Page 60
منابع مشابه
Critical Lock Analysis: Diagnosing Critical Section Bottlenecks in Multithreaded Applications on Multicore Systems
Critical sections are well known potential performance bottlenecks in multithreaded applications and identifying the ones that inhibit scalability are important for performance optimizations. While previous approaches use idle time as a key measure, we show such a measure is not reliable. The reason is that idleness does not necessarily mean the critical section is on the critical path. We intr...
متن کاملApplication of Artificial Neural Networks for Analysis of Flexible Pavements under Static Loading of Standard Axle
In this study, an artificial neural network was developed in order to analyze flexible pavement structure and determine its critical responses under the influence of standard axle loading. In doing so, more than 10000 four-layered flexible pavement sections composed of asphalt concrete layer, base layer, subbase layer, and subgrade soil were analyzed under the impact of standard axle loading. P...
متن کاملRemote Core Locking: Performance through Serialization
Today’s many-core systems require scalable applications. Soware needs to be able to use the concurrent computation potential oered by a large number of cores. However the scalability of many applications is limited by data locality and the performance of lock algorithms. To improve both locality and lock performance, Lozi et al. propose the concept of Remote Core Locking (RCL) [3]. is paper ...
متن کاملHighly Concurrent Locking in Shared Memory Database Systems
In parallel database systems, conflicts for accesses to objects are solved through object locking. In order to acquire and release locks, in the standard implementation of a lock manager small sections of the code may be executed only by a single thread. On massively parallel shared memory machines (SMM) the serialization of these critical sections leads to serious performance degradation. We e...
متن کاملCloned Transactions: A New Execution Concept for Transactional Memory
Transactional memory aims to replace mutual exclusion in critical sections with transactions on shared data to improve the scalability of concurrent applications and eliminate traditional issues of parallel programming such as deadlocks/livelocks. One key aspect of transaction processing is the concurrency control, which is responsible to find an interleaving or overlapping schedule for running...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010